arXiv: 1506.01115vl [cs.CV] 3Jun2015 


Hyperspectral Image Classification and Clutter Detection via 
Multiple Structural Embeddings and Dimension Reductions 


Alexandros-Stavros Iliopoulos* Tiancheng Liu^ Xiaobai Sun* 


Abstract 

We present a new and effective approach for Hyperspectral Image (HSI) classification and clutter detection, over¬ 
coming a few long-standing challenges presented by HSI data characteristics. Residing in a high-dimensional spectral 
attribute space, HSI data samples are known to be strongly correlated in their spectral signatures, exhibit nonlinear 
structure due to several physical laws, and contain uncertainty and noise from multiple sources. In the presented 
approach, we generate an adaptive, structurally enriched representation environment, and employ the locally linear 
embedding (LLE) in it. There are two structure layers external to LEE. One is feature space embedding: the HSI data 
attributes are embedded into a discriminatory feature space where spatio-spectral coherence and distinctive structures 
are distilled and exploited to mitigate various difficulties encountered in the native hyperspectral attribute space. The 
other structure layer encloses the ranges of algorithmic parameters for LLE and feature embedding, and supports a 
multiplexing and integrating scheme for contending with multi-source uncertainty. Experiments on two commonly 
used HSI datasets with a small number of learning samples have rendered remarkably high-accuracy classification 
results, as well as distinctive maps of detected clutter regions. 


1 Introduction 

We are concerned in this paper with analysis of hyperspectral imaging (HSI) data; in particular, we address the task of 
high-accuracy multi-class labeling, as well as clutter detection as a necessary complement. 

Enabled by advanced sensing systems, such as the NASA/JPL AVIRIS [11], NASA Hyperion [23], and DLR 
ROSIS [13] sensors, hyperspectral imaging, also known as imaging spectroscopy, pertains to the acquisition of high- 
resolution spectral information over a broad range, providing substantially richer data than multi-spectral or color 
imaging. HSI combines spectral with spatial information, as samples are collected over large areas, at increasingly fine 
spatial resolution. With its rich information provision and non-invasive nature, HSI has become an invaluable tool for 
detection, identification, and classification of materials and objects with complex compositions. Relevant application 
fields include material science, agriculture, environmental and urban monitoring, resource discovery and monitoring, 
food safety and security, and medicine [4,14, 21]. As sensing technologies continue to advance, HSI is providing 
larger collections of data to facilitate and enable scientific and engineering inquiries that were previously unfeasible. 
At the same time, it challenges many existing data analysis methods to render high-quality results commensurate with 
the richness of available information in HSI data. 

Among the key challenging factors for HSI data analysis are: the curse of dimensionality of the spectral feature 
space, which hampers class discrimination (Hughes effect [15]) and exacerbates the computational complexity of the 
analysis process; strong and nonlinear spatio-spectral correlations and mixing across spectral bands, as well as cross¬ 
mixing between spatial pixels and spectral bands [3]; and multiple sources of noise and uncertainty with regard to the 
imaged scene and acquisition process [1]. 

A host of data analysis approaches have been investigated for use in HSI classification [3]. We may roughly 
categorize them according to the feature space where classification takes place and whether or not the corresponding 
models are linear. For example, band selection and linear combination techniques for classification [16,19,26] reduce 
the dimensionality of the spectral attribute space based on linear signal and image models. Kernel-based classifiers, 
such as SVMs [6,20], respect nonlinearity, applying a nonlinear transform to the data attributes and embedding them in 
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a high-dimensional classification space. While such methods can be effective with certain data, they can be sensitive to 
the chosen embedding kernel and the number or distribution of available training samples. A different approach is that 
of manifold learning methods 11,21,22,24,28], where high-dimensional embedding of the data samples is followed 
by dimensionality reduction. Such methods assume that a (principal) manifold structure underlies the collected data 
samples; subsequent analyses are then based on the isometric principles associated with manifolds. Another important 
assumption is that the features lie in a well-defined metric space; manifold learning methods are sensitive to the choice 
of metric for neighborhood definition, as well as to the density and distribution of data samples. Indeed, a naive 
application of such an approach to HSI data may suffer from the high correlation and various uncertainty sources in 
the hyperspectral attribute space. It should be noted, additionally, that some algorithms incur too high a computational 
cost for them to be practical for HSI data analysis, even more so as the spatial coverage and resolution of hyperspectral 
sensors is increasing. 

We address here the aforementioned standing issues in HSI classification:(i) nonlinear correlation and irregular 
singularities, (ii) multiple-source uncertainties with respect to the HSI data structure, and (iii) high data dimensionality. 
The first problem is in part responsible for an existing gap between HSI data collection and analysis: while spectral 
and spatial information is coupled in HSI scenes, it is typically processed in a decoupled manner. From an alternative 
perspective, the strong correlation in HSI data can be exploited to help overcome the other two challenges. In our 
approach, we start with exploring and utilizing the spatial and spectral coherence of HSI data in tandem. There are 
various methods that attempt to incorporate spatial coherence 110,17,21,27] in the analysis process; these approaches 
can be seen as special or extreme cases in the framework we introduce in this paper. 

There are three key components in our framework for HSI classification and clutter detection: (i) The Locally 
Linear Embedding (LLE) method of Roweis and Saul 124] provides the basic computational procedure for deriving a 
manifold representation of the data; we review LLE in section 2 and comment on our interpretation, our rationale for its 
selection, and its specific form within our method, (ii) Prior to the LLE computations, we embed the HSI samples to a 
structural feature space using efficient, local filters to highlight their spatio-spectral structure, thus exposing potential 
discriminatory singularities and contending with noise in the data, while avoiding de-correlation; we describe the 
feature embedding concept and its connection to the LLE processing in section 3. (iii) We consider an ensemble of 
structural embeddings and representations, defined by multiple parameter instances for the other two components, to 
counteract the effect of multiple uncertainties; we describe in section 4 the relevant ensemble parameters, as well as 
our scheme for multiplexing and integrating the results over all instances. 

Experimental results with our approach are presented in section 5. They demonstrate evidently high-accuracy 
classification and clutter detection. Indeed, the estimated clutter maps we extract appear to be the first of their kind 
in the context of HSI classification. Clutter areas shape boundaries and delineate coherent, labeled regions; they may 
also contain objects of interest or new classes to be analyzed, and may be of higher value to various data analysis ap¬ 
plications. We consider clutter maps, such as the ones presented in this paper, as critical information that complements 
classification in the traditional sense. The joint provision of classification and clutter detection estimates serves to 
make HSI data analysis independent of artificial or impractical conditions, and impacts the rendering of higher quality, 
interpretable analysis results. 


2 The LLE method for classification 

The core processing module for HSI structure encoding and classification in our approach is the Locally Linear Em¬ 
bedding (LLE) method of Roweis and Saul 124]. The basic assumption behind it is that a set of data samples in 
a high-dimensional space of observable attributes is distributed over an underlying low-dimensional manifold; LLE 
may then be used to map the data samples to the principal manifold coordinate space, or parameter space. This 
assumption conforms well to HSI data, owing to their non-linear, correlated structure, as per the physical laws of 
radiative transfer and sensor properties and calibration 11,3], whereas direct use of linear dimension reduction models 
is ill-suited for HSI data analysis. 

LLE has rendered surprisingly good results in classification or clustering of synthetic data samples on low¬ 
dimensional manifolds (e.g. Swiss roll) and certain image data (such as handwritten digits and facial pose or illu¬ 
mination) 18,24]. Several theoretical interpretations and algorithmic extensions have been proposed for LLE 12, 9], 
and it is increasingly applied to domain-specific data analysis tasks. HSI classification ranks among such tasks 118,21], 
albeit scarcely. 

In this work, we adopt LLE as a core procedure for HSI classification due to three of its remarkable proper- 
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ties:(i) the natural connection between a globally connected embedding of local geometric structures and sparse cod¬ 
ing; (ii) the translation invariance of local geometry encoding and its preservation by dimensionality reduction; and 
(iii) the strikingly simple and computationally efficient algorithmic structure. We briefiy describe the LLE processing 
steps and remark on certain aspects based on our interpretation. 

Let X = [xi • • • Xj • • • Xjv], where xy G be a set of N samples in a E)-dimensional feature space. First, 
a set of neighboring samples, denoted by JVj, is located for every sample, Xj. We employ the /c-nearest neighbors 
(/cNN) scheme because of its relative insensitivity to sample density; our measure for neighborhood definition is based 
on angular (cosine) similarity. 

The local geometry around each sample point, xy, is then encoded by a vector of local coefficients (weights). 
These coefficients place Xj at the neighborhood barycenter and the corresponding vector is numerically orthogonal to 
the tangent plane spanned by its neighbors about the center. Specifically, the local weights, Wij, are determined by the 
following local least squared problem, subject to the affine combination condition: 

for all j G {1,..., A^}. The affine combination not only makes the sample point the neighborhood barycenter, but also 
means that the local encoding is translation invariant. 

Equation (1) may be rewritten in matrix form as 

min||X(I-W)||G s.t. e"^(I - W) = 0, (2) 

w 

where H-H^ is the Frobenius norm, I is the identity matrix, e is the constant-1 vector, and W is an x matrix, 

w = K]- 

Once W is computed, the left singular vectors, Y, corresponding to the smallest singular values of (I — W) 

are obtained: 


mm 

{Wij} 


- X 


s.t. 


imn||Y(I - W)||^ , s.t. YY^ = I^+i, (3) 

where d < D the reduced dimensionality. The low-dimensional representation, Y, of the data samples preserves 
local geometry and global connectivity as encoded in (I — W). 

Finally, a classifier is employed to label the data in the low-dimensional manifold parameter space. We use a 
simple nearest-neighbor classifier to investigate the efficacy of the embedding and dimension reduction process with 
respect to classification. 

A few additional remarks: The sparsity pattern of the weight matrix, W, is determined by the /cNN search in 
the first step, while the corresponding numerical values of W are determined via eq. (1) in a local, column-wise 
independent fashion. More importantly, W, as per eq. (2), encodes the global interconnection of local hyperplanes 
via the transitive property of neighborhood connections, without entailing the explicit, computationally expensive 
calculation of all pairwise shortest connection paths. The W matrix can also be seen as a simple kernel-based embed¬ 
ding. The low-dimensional space spanned by Y includes constant-valued vectors, corresponding to the zero singular 
value, whose geometric multiplicity may be greater than 1. The discriminatory information lies in the d-dimensional 
subspace that is orthogonal to the constant vector, e. 

3 Feature space embedding 

HSI data samples are known to be strongly correlated in their spectral signatures [1,4,19]. Strong correlation between 
features complicates the choice of a discriminatory distance or similarity metric, particularly so in a high-dimensional 
setting. Furthermore, nonlinearity and high dimensionality render de-correlation attempts ineffective. Increasing the 
learning sample density is impractical and may yield limited improvements; learning from sparse reference sample 
subsets is desired, instead. 

We take a novel approach, namely structural feature embedding, to alleviate these fundamental issues. We explore 
the spatio-spectral coherence structure of HSI data, and embed the spectral attribute space in a structure-rich space, 
where data-specific features may be made more salient. Then, a conventional distance metric in the embedding feature 
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Table 1: HSI dataset summary. 


Dataset 

Sensor 

Spectral domain {nm) 

Spatial domain (m^) 

#classes 

Labeled area 

range 

resolution #bands 

#pixels 

resolution 

coverage 

Indian Pines 

AVIRIS [11] 

410-2450 

10 220 

145x145 

200 

16 

49.4% 

Univ. of Pavia 

ROSIS [13] 

430-860 

4 103 

610x340 

1.7 

9 

20.6% 


space may be seen as an ad hoc discriminatory one in the original attribute space. Moreover, the computational 
complexity for structural feature embedding scales linearly with the dataset size, which is much more efficient than 
that of even linear de-correlation. 

Specifically, we explore spatial and spectral coherence by using a bank of filters. Formally, the filters define a set 
of basis (or transform) functions, <l>, such that the embedded data become 

X = $(X) = [<^i(X) • • • <^i(X) • • • <^m(X)]^, (4) 

where each basis, 0^, is local with respect to the spatial and/or spectral domain of the HSI dataset, X. Thus, the 
embedded feature space may be efficiently computed, removing certain noise components while preserving the un¬ 
derlying manifold structure. The distance or similarity between any two samples is then measured in the embedded 
feature space. 

Feature transformation and embedding directly impact the metric for neighborhood definition and subsequent 
encoding of local geometry. A closely related notion with respect to the spatial properties of the HSI is the spa¬ 
tially coherent distance function introduced by Mohan et al [21], where it is proposed that distance calculations be 
performed using all features in a local, ordered patch around each pixel. Here, we introduce the notion of feature em¬ 
bedding as a basic mechanism for effecting a data-specific geometric metric by means of a conventional metric, thus 
circumventing the explicit definition of new, complicated metrics. Note, for example, that employing the patch-based 
spatially coherent distance of Mohan et al is equivalent to applying a box filter to each HSI band prior to distance 
calculations—except that the latter is insensitive to the particular ordering of pixels within the patch, making similarity 
discovery more robust with respect to local composition variations and object boundaries. 

In general, the feature transform basis functions, or simply filters, can be divided into two groups: generic ones 
that may be useful to any HSI analysis task, and data- or analysis-specific filters, depending on one’s objective. The 
filters can be also grouped according to their geometric and statistical features. We consider two particular types of 
spectral filters: differential and integral. Differential filters elucidate local characteristics of the spectral signature of 
each sample, and generally down-weigh spurious similarity contributions induced by correlation between consecutive 
spectral bands. Integral filters, on the other hand, may be used to extract statistical, noise-insensitive properties of 
spectral signatures. 

This embedding mechanism allows us to probe the HSI data at different scales, depending on the support and 
order of the spatial or spectral filters; hence, the hyperspectral data are embedded in a feature space that captures their 
structure at the relevant scale. In the experiments carried out in this paper, we use spatial box filtering, and extend the 
spectral features with their numerical gradient and first two statistical moments (mean and standard deviation). 

A few remarks are in order on the computation of local neighborhoods. Obtaining the local neighborhoods, J\f, 
which directly affect the estimated manifold structure and parameters, amounts to computation of all /c-nearest neigh¬ 
bors sets among the hyperspectral samples. This starts to become problematic as the size of the HSI increases, due to 
the high computational cost of /cNN searching in the high-dimensional embedded (or original) feature space. Based 
on the spatial coherence of HSIs—and given that the size of each local neighborhood should be relatively small for 
the approximately linear structure assumption to hold in its vicinity—we circumvent this issue by bounding the search 
for spectral neighbors within an ample spatial window centered around each pixel. 


4 Structural algorithm ensemble 

As has already been mentioned, there are multiple sources that introduce variations and uncertainty to the underlying 
HSI manifold structure. To name a few, such variations may stem from scattering, atmospheric conditions, spectral 
mixing of material constituents, etc [1,3]. Another related issue is that HSI samples pertaining to different compounds 
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may be distributed inhomogeneously along the manifold surface. The introduction of uncertainty from a diverse set of 
sources to the observed HSI attributes means that the sample manifold will tend to exhibit multi-scale structure. These 
considerations motivate us to probe the HSI data at different scales in order to uncover the underlying structure. 

The derived HSI representation depends on several parameters in all stages of the embedding and dimension 
reduction procedure, each capturing different properties of the HSI manifold: (i) the choice of spatial and spectral 
filter parameters determines the type and scale of features that define similarity between samples; (ii) the size of local 
neighborhoods, relative to the sample distribution density around each sample, defines the coarseness and connectivity 
of the manifold encoding in the embedded feature space; and (iii) the dimensionality of the parametrized manifold 
representation affects the type of manifold features that are used for classification. 

We define a relevant search space for the set of these algorithmic parameters and obtain an ensemble of structural 
embeddings and low-dimensional manifold representations of the HSI data. For all HSI samples, we find the label of 
their nearest reference sample in each representation instance. This set of proximity labels is then used to obtain the 
classification results, together with a clutter map estimate. 

4.1 Classification entropy and clutter estimation 

Hyperspectral image scene classification methods typically assign each pixel in the imaged scene to one of the classes 
for which labeled reference samples in the scene (also known as ground truth) are available. Oftentimes, however, a 
large portion of the HSI may be comprised of pixels that belong to none of the labeled classes; these pixels constitute 
clutter with respect to the specified label-set. Clutter pixels are likely diverse in terms of their spectral features, and 
cannot generally be considered to correspond to a single, new class. A related but somewhat different approach is taken 
in the context of anomaly detection. There, identification of the “clutter” (anomalous) region typically depends on the 
collection and utilization of statistical properties of relevant scenes, obtained from a large set of learning examples [7]. 
Here, we do not require additional data beyond those in a single HSI data cube, and restrict the reference/learning 
samples, used for classification, to a sparse subset of available data samples. 

For classification and clutter detection, we first obtain a classification entropy score for every non-reference pixel, 
as follows. Each non-reference pixel is matched to its nearest (in the low-dimensional classification space) reference 
pixel, for all instances, or trials, that make up our ensemble. Hence, given a total of T trials, each pixel is associated 
with a vector of T proximity labels. This vector is converted to a frequency vector of length L, where L is the number 
of labeled classes. Let be the count of the ^-th label, I G {1,..., L}, in proximity-label vector of the j-th pixel, 
and fji = ^ be the corresponding relative frequency. Taking an information-theoretic approach, we define the 
classification entropy for the j-th pixel as 


L 

(5) 

1=1 

The classification entropy score Hj lies in [0,1]. At one extreme {Hj = 0), the labeling frequency vector of the 
j-th pixel has only one non-zero element, meaning that all of its proximity labels are the same. At the other extreme 
{Hj = 1), the frequency vector is constant, meaning that all proximity labels for the pixel are equally frequent among 
the T instances or trials. Empirically, Hj measures the classification ambiguity of the j-th pixel. A pixel with a high 
classification entropy score is most likely a clutter pixel, whereas a pixel with a low score is likely to belong to one of 
the available classes. The Hj scores for all pixels can be displayed as a grayscale image, providing an classification 
entropy map for a given experimental ensemble—see section 5.4. 

Using a threshold, Tdt, we make use of the classification entropy map to split the HSI scene into two complementary 
parts: clutter regions {Hj > Tdt), where no label is given to the corresponding pixels, and labeled regions {Hj < Tdt), 
where each pixel is matched to the available classes. While a diverse set of methods has been proposed for combining 
results in multiple classifier systems [5,12,29,30], most rely on the availability of enough training data or knowledge of 
certain statistical properties of the dataset and/or classifiers, which may not be the case in many practical applications. 
Here, we assign each pixel to the most frequently returned class for it among the set of results for each classifier 
instance. This simple rule provides us with a baseline regarding the performance of our methodology; moreover, it 
does not entail additional assumptions or abundance of labeled data, and we have found it to generally improve upon 
any single classifier instance throughout our experiments. 
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Figure 1: Classification and clutter detection results for the Indian Pines scene, (a) and (e) RGB composite [25] and 
manual classification labeling and mask, (b)-(d) 10% labeled data sampling: masked classification; classification 
entropy map; classification and clutter removal with Tcu = 0.25. (f)-(h) 5% labeled data sampling: same as (b)-(d) 
with Tcit = 0.30. 


5 Experiments 

5.1 Datasets 

Two publicly available HSI datasets have been used to appraise the effectiveness of our approach. One is the Indian 
Pines^ scene, recorded by the AVIRIS sensor [11] in Northwestern Indiana, USA. It consists mostly of agricultural 
plots (alfalfa, corn, oats, soybean, wheat), and forested regions (woods, and different sub-classes of grass), while a few 
buildings may also be seen. Several classes exhibit significant spectral overlap, as they correspond to the same basic 
class under different conditions. 

The other is the University of Pavia^ scene, recorded by the ROSIS sensor [13] in Pavia, Italy. It covers an 
urban environment, with various solid structures (asphalt, gravel, metal sheets, bitumen, bricks), natural objects (trees, 
meadows, soil), and shadows. Objects whose compositions differ from the labeled ones are considered as clutter. 

Both datasets are available with a manually labeled mask, where each pixel is assigned a class (color) or is dis¬ 
carded as clutter (black). An RGB composite image and the labeled mask for the two datasets are shown in figs, la, 
2a, le and 2e. A summary of relevant parameters for the two datasets may be found in table 1, and the corresponding 
reference label maps are shown in appendix A. 

5.2 Experimental set-up 

Prior to any other processing, 24 noisy bands were removed from the Indian Pines dataset; these correspond to 20 
water absorption bands [26] and another 4 that were dominated by noise. All bands were kept for the University of 
Pavia dataset (albeit 12 have already been removed from the data in the public repository). 

For all experiments presented here, the algorithmic ensemble parameters were as follows :(i) The spectral bank 
consisted of the identity (i.e. the original attributes were used), numerical gradient, mean, and standard deviation 
filters; spectral features were extracted at two scales using the {whole, odd, even} spectrum, (ii) A spatial box filter was 
applied to all features, using a p x p neighborhood, where p G {3, 5}. (iii) The size of local manifold neighborhoods 
was k G {5,10,15}. (iv) The dimension of the manifold-representation classification space was dG {10,20, 30}. 

^https://engineering.purdue.edu/~biehl/MultiSpec/hyperspectral.html 

^http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes 
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(e) (f) (g) (h) 


Figure 2: Classification and clutter detection results for the University of Pavia scene, (a) and (e) RGB composite [17] 
and manual classification labeling and mask, (b)-(d) 5% labeled data sampling: masked classification; classification 
entropy map; classification and clutter removal with r^t = 0.15. (f)-(h) 2% labeled data sampling: same as (b)-(d) 
with Tcit = 0.15. 


The resulting ensemble was comprised by a single instance for each element in the Cartesian product of algorithmic 
parameter sets, for a total of 54 embeddings and low-dimensional representations of the HSI data. Nearest-neighbor 
searching was bounded using a 51 x 51 sliding window. Labels were acquired via pixel-wise nearest-neighbor classi¬ 
fication for each instance and non-weighted consensus for the ensemble. 

Reference labeled data for classification were sampled uniformly at random, using the reference labeled mask to 
extract samples at 10% or 5% density per class for the Indian Pines dataset, and at 5% or 2% density per class for the 
University of Pavia dataset. The label-set size ranged from approximately 120 to just 1 pixel per class, depending on 
the relative coverage of the HSI scene. 


5.3 Rendering schemes 

We render experimental results in three ways. First, we follow the conventional scheme, where only pixels that 
belong to a class in the reference label map are considered—the rest are discarded, regardless of the corresponding 
classification results. Quantitative results are provided using the standard overall accuracy (OA; percentage of correctly 
classified pixels) and average accuracy (AA; average of class-wise classification accuracy percentages) metrics. 

While the OA and AA metrics allow comparisons with a reference (manual) classification result, they cannot 
capture other aspects of the classification problem, and provide no information as to the separation of clutter and 
labeled samples. Hence, in the absence of available reference data for the whole scene, we resort to visual appraisal of 
the classification and clutter detection results using the other two rendering schemes. 

One is a gray-scale rendering of the classification entropy (clutter estimate) map; ideally, it should be dark for 
labeled regions and bright for clutter. Last, we render the final classification results with our approach, by merging 
the ensemble consensus labeling with a clutter mask, obtained by thresholding the clutter estimate image. Good 
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Table 2: Classification accuracy for the embedding ensemble and instances. 


Dataset 

Labeled 

Instances [mean ± std (max)] 

Ensemble 

samples 

OA (%) 

AA (%) 

OA (%) 

AA (%) 

Indian 

5% 

85.79 ±4.12 (92.88) 

82.06 ±5.68 (91.66) 

95.39 

94.85 

Pines 

10% 

90.00 ±3.57 (96.07) 

87.68 ±4.87 (95.45) 

97.34 

97.13 

University 

2% 

94.87 ±2.38 (97.86) 

92.29 ±3.68 (97.13) 

98.84 

98.42 

of Pavia 

5% 

96.92 ±1.76 (98.99) 

95.41 ±2.47 (98.43) 

99.60 

99.32 


results should have the following qualities: each region is classified correctly, region boundaries are respected by the 
classification map, and clutter is accurately identified. 

5.4 Results 

A summary of the classification accuracy metrics for both HSI datasets, measured with respect to the corresponding 
manually labeled mask, is shown in table 2, for the embedding instances as well as the ensemble. We can see that the 
ensemble outperforms all instances, having a significant margin from the majority of the latter. This is especially true 
for the Indian Pines dataset, which proves to be more difficult than the University of Pavia one, due to the spectral 
overlap between different classes and very low spatial resolution, which means that there may be substantial variability 
among pixels of the same class. For both datasets, very high classification accuracy is attained. Note, however, that 
these metrics only take a portion of the image into account. 

Results for the Indian Pines dataset are displayed in fig. 1. It can be readily seen in figs, lb and If that classification 
errors are mostly localized around a couple of difficult regions. Nevertheless, the clutter estimate maps clearly capture 
the outline structure of the scene—and many of the mis-classified regions are acknowledged as somewhat ambiguous. 
Looking at the fused classification-clutter images in figs. Id and Ig, we can already see the efficacy of the proposed 
methodology: the overall structure of the manual label-mask is recovered nicely, albeit without particularly sharp 
features. In addition, we are able to recover regions that were not labeled, although they rather clearly extend beyond 
the manually drawn boundaries: for example, notice the woods area (red) towards the bottom-right corner, highlighted 
with a superimposed rectangle. 

Corresponding results for the University of Pavia dataset are shown in fig. 2. Here, we attain near-perfect clas¬ 
sification results when compared to the manual labeling. More importantly, however, we seem to be able to recover 
a very high-fidelity profile of the whole scene, without any prior assumptions about the distribution of clutter pixels. 
Indeed, objects belonging to labeled classes are identified inside unlabeled regions, and figs. 2c and 2h appear to pro¬ 
vide a much more accurate view of the scene than even the manually labeled mask. For example, two such regions 
are highlighted, where a stretch of road and a set of trees are identified in the unlabeled regions, refiecting the view of 
the composite color image with high fidelity. While it can be seen that fig. 2d does perform better than fig. 2g, it is 
noteworthy that the vast majority of the scene structure is recovered using reference samples with 2% density. 

6 Discussion 

We have presented a new approach for HSI classification and clutter detection via employing an algorithmic ensemble 
of structural feature embeddings, nonlinear dimension reduction with the LLE method, and a classifier to be used 
in the low-dimensional manifold parameter space. For feature embedding, we have used only a few simple types of 
feature transform functions to explore and exploit the spatial and spectral coherence structure in the HSI data. These 
simple steps, following the isometric principles of manifold structures, have rendered remarkable results for the two 
datasets studied in this paper, while each step may be easily modified or customized to suit a particular application 
context, if necessary. Presently, the parameters ranges for manifold dimension estimation and the number of neighbors 
are prescribed. A desirable extension is to have such ranges determined automatically and adaptively for each dataset. 

We have given our rationale for utilizing LLE at the core of our approach. The LLE method can be connected 
to multiple methods for classification, segmentation, or clustering. While various extensions to LLE and alternative. 


8 









Hyperspectral Image Classification and Clutter Detection via Multiple Structural Embeddings and Dimension Reductions Iliopoulos, Liu, Sun 


related approaches to manifold derivation exist, we have found LLE to be as good as or superior to them, while offering 
a particularly simple computational structure. There is still more to be understood regarding behavior of these methods 
and their connections to one another. 


A Reference labeling for the HSI datasets 

The reference classification data (typically used as ground truth) for the Indian Pines^ and University of Pavia"^ scenes 
are shown in figs. 3 and 4. The unlabeled regions account for 51% of the entire image domain for the former, and 80% 
for the latter—see table 1 . 




Alfalfa 

Corn-notill 

Corn-mintill 

Corn 

Grass/Pasture 

Grass/Trees 

Grass/Pasture-mowed 

Hay-windrowed 



Oats 

Soybeans-notill 

Soybeans-mintill 

Soybeans-clean 

Wheat 

Woods 

Bldg-Grass-Tree-Drives 
Stone-Steel-Towers 


(b) 


Figure 3: Available labeling information for the Indian Pines scene, (a) Reference labeling map (16 colored classes 
and black unlabeled regions), (b) Class color legend. 





Asphalt 

Meadows 

Gravel 

Trees 

Painted metal sheets 
Bare soil 
Bitumen 

Self-blocking bricks 
Shadows 


(b) 


Figure 4: Available labeling information for the University of Pavia scene, (a) Reference labeling map (9 colored 
classes and black unlabeled regions), (b) Class color legend. 


^https://engineering.purdue.edu/~biehl/MultiSpec/hyperspectral.html 

"^http : / /www. ehu . eus/ccwintco/index . php?t it le=Hyperspectral_Remote_Sensing_Scenes 
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Table 3: Classification accuracy for the embedding ensemble and instances without feature embedding. 


Dataset 

Labeled 

Instances [mean ± std (max)] 

Ensemble 

Samples 

OA (%) 

AA (%) 

OA (%) 

AA (%) 

Indian 

5% 

76.30±3.11 (73.31) 

74.37 ±2.66 (77.42) 

82.09 

78.67 

Pines 

10% 

80.18 ±3.09 (83.14) 

79.63 ±2.66 (83.00) 

85.83 

83.77 

University 

2% 

95.18 ±1.55 (96.86) 

94.44 ±1.78 (96.27) 

97.53 

96.89 

of Pavia 

5% 

96.82 ±1.17 (97.84) 

96.11 ±1.29 (97.27) 

98.60 

98.13 


B Experiments without feature embedding 

We have presented a comparison of classification results between the embedding instances and ensemble in table 3. 
In addition to the superior classification accuracy, the ensemble scheme also enables the provision of the clutter map. 
Here, we provide experimental results that factor out and highlight the effect of feature space embedding prior to 
employment of the LLE method. 

In particular, we carry out a set of parallel experiments to those of 5 without application of the spatial-spectral 
filters; the ensemble size is consequently reduced to 9. Results for the two datasets are shown in figs. 5 and 6, 
respectively; these are analogous to figs. 1 and 2. Table 3 summarizes the attained classification accuracy, same as 
table 3. Evidently, the experiments with feature embedding yield higher classification accuracy, as well as sharper 
clutter maps and labeled region boundaries. 

We remark also on the improvement extent that may be gained by feature space embedding. Erom the class legends 
provided in appendix A, one may expect a significant difference between the two datasets, with regard to inter-class 
similarities. Indeed, spectral signatures in the India Pines scene are very similar between certain classes (such as 
different corn fields, soybean areas, or grass patches), whereas classes in the University of Pavia scene feature more 
distinctive signatures in comparison. This difference between the datasets means that the former presents a greater 
challenge to conventional discrimination metrics, and thereby benefits more from feature space embedding, which 
effectively amounts to an adaptive transformation of the distance metric in the original feature space. Such benefits 
are confirmed by our experimental results. 
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sampling: same as (b)-(d) with Tdt = 0.15. 
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